lightweight generative adversarial network
Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
We propose a novel lightweight generative adversarial network for efficient image manipulation using natural language descriptions. To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word-level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text. Furthermore, thanks to the explicit training signal related to each word, the discriminator can also be simplified to have a lightweight structure. Compared with the state of the art, our method has a much smaller number of parameters, but still achieves a competitive manipulation performance. Extensive experimental results demonstrate that our method can better disentangle different visual attributes, then correctly map them to corresponding semantic words, and thus achieve a more accurate image modification using natural language descriptions.
Review for NeurIPS paper: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
Weaknesses: - The technical novelty of the proposed method is somewhat incremental since it is largely based on the work from [14] with some modifications to the generator and the discriminator architectures. The word-level training feedback in the discriminator seems to be the main technical contribution, but is not ground-breaking as it extends the auxiliary classifier in conditional GAN with multiple classes (i.e. Specifically, only the nouns and adjectives are chosen manually as text-relevant attributes, which convey a very limited context of general descriptions. Although it may allow a fine-control of the image content in a limited context, it reduces the capability of aligning rich context of the text to the image, often available in approaches learning to encode the whole sentence (e.g. Although authors made some justifications in Section 3.2.1 of using heuristic approach, it does not feel that this assumption holds in general. Current comparisons are mostly focused on ManiGAN.
Review for NeurIPS paper: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
The paper proposes a novel text-guided image manipulation method by proposing word-level discriminator loss. The proposed method is faster and requires less memory compared to existing models, and the experimental results show improvements over the baseline method (MainGAN). The paper initially received mixed ratings but the concerns were addressed by the rebuttal and all reviewers converged in favor of acceptance. The authors should revise the paper reflecting the reviewers' suggestions and as promised by the rebuttal. NOTE FROM PROGRAM CHAIRS: For the camera-ready version, please expand your broader impact statement to discuss the potential negative impacts of your work, such as forgery and deepfakes, as well as possible mitigations.
Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
We propose a novel lightweight generative adversarial network for efficient image manipulation using natural language descriptions. To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word-level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text. Furthermore, thanks to the explicit training signal related to each word, the discriminator can also be simplified to have a lightweight structure. Compared with the state of the art, our method has a much smaller number of parameters, but still achieves a competitive manipulation performance. Extensive experimental results demonstrate that our method can better disentangle different visual attributes, then correctly map them to corresponding semantic words, and thus achieve a more accurate image modification using natural language descriptions.